Using Distributional Semantic Models and Levenshtein Distance Normalization

نویسندگان

  • Lisa Tengstrand
  • Beáta Megyesi
  • Martin Duneld
چکیده

In the medical domain, especially in clinical texts, non-standard abbreviations are prevalent, which impairs readability for patients. To ease the understanding of the physicians’ notes, abbreviations need to be identified and expanded to their original forms. This thesis presents a distributional semantic approach to find candidates of the original form of the abbreviation, which is combined with Levenshtein distance to choose the correct candidate among the semantically related words. The method is applied to radiology reports and medical journal texts, and a comparison is made to general Swedish. The results show that the correct expansion of the abbreviation can be found in 40% of the cases, an improvement by 24 percentage points compared to the baseline (0.16), and an increase by 22 percentage points compared to using word space models alone (0.18).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints

We present our experience in applying distributional semantics (neural word embeddings) to the problem of representing and clustering documents in a bilingual comparable corpus. Our data is a collection of Russian and Ukrainian academic texts, for which topics are their academic fields. In order to build language-independent semantic representations of these documents, we train neural distribut...

متن کامل

NTNU-CORE: Combining strong features for semantic similarity

The paper outlines the work carried out at NTNU as part of the *SEM’13 shared task on Semantic Textual Similarity, using an approach which combines shallow textual, distributional and knowledge-based features by a support vector regression model. Feature sets include (1) aggregated similarity based on named entity recognition with WordNet and Levenshtein distance through the calculation of maxi...

متن کامل

Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization

Edit distance metrics are widely used for many applications such as string comparison and spelling error corrections. Hamming distance is a metric for two equal length strings and Damerau-Levenshtein distance is a well-known metrics for making spelling corrections through string-to-string comparison. Previous distance metrics seems to be appropriate for alphabetic languages like English and Eur...

متن کامل

Disease Named Entity Recognition and Normalization using Conditional Random Fields and Levenshtein Distance

This presents a machine learning-based approach for disease named entity recognition and normalization (DNER) subtask of Chemical Disease Relation (CDR) task in BioCreative V. This approach employs a Conditional Random Fields (CRF) based model with domain specific features in biomedical area in disease named entity recognition. In order to improve the performance of entity normalization, the me...

متن کامل

EACL - Expansion of Abbreviations in CLinical text

In the medical domain, especially in clinical texts, non-standard abbreviations are prevalent, which impairs readability for patients. To ease the understanding of the physicians’ notes, abbreviations need to be identified and expanded to their original forms. We present a distributional semantic approach to find candidates of the original form of the abbreviation, and combine this with Levensh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014